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1 . A method of designing a hardware threaded circuit architecture, comprising: 

determining a total area available for processing elements; 

determining a set of task arrival times for tasks to be processed by the processing 
elements; 

determining a number of possible implementations for the processing elements 
within the area available, each of the possible implementations having a corresponding 
number of processing elements; 

interconnecting first and second ones of the processing elements; 

determining overall system wait times for the possible implementations; and 

selecting a first one of the possible implementations based upon the overall system 
wait times. 

2. The method according to claim 1, further including determining an average steady 
state time the tasks spend in queue and/or an average steady state time the tasks spend in 
the processing elements. 

3. The method according to claim 1, further including scheduling utilization of the 
processing elements to process the tasks. 

4. The method according to claim 1 , further including determining a number of the 
processing elements to be interconnected together in a hardware threaded arrangement. 

5. The method according to claim 1, further including determining a state-based flow for 
an application to be processed by the circuit. 

6. The method according to claim 6, further including detemiining a number of pipeline 
stages based upon the state-based flow for the application. 

7. The method according to claim 6, further including generating a threaded schedule that 
can include parallel processing of the pipeline stages by at least two of the processing 
elements. 
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1 8. The method according to claim 7, further including reducing a frequency of operation 

2 and meeting a predetermined an overall system wait time. 

1 9. The method according to claim 8, wherein the predetermined overall system wait time 

2 corresponds to an overall system wait time associated with non-threaded processing. 

1 10. The method according to claim 7, further including reducing a supply voltage level 

2 while maintaining a predetermined overall system wait time. 

1 11. The method according to claim 1 0, wherein the predetermined overall system wait 

2 time corresponds to an overall system wait time associated with non-threaded processing. 

1 12. A circuit designed in accordance with claim 1 . 

1 13. A method of scheduling processing in a hardware threaded circuit, comprising: 

2 receiving inputs corresponding to unthreaded processing of an application; 

3 receiving information including processing element resources, a number of 

4 processing elements, and a window size corresponding to a number of downstream 

5 processing states to be examined; and 

6 generating a hardware threaded schedule for processing the application with at 

7 least first and second one of the processing elements being interconnected to enable 

8 dynamic resource sharing. 

1 1 4. The method according to claim 13, further including synthesizing the hardware 

2 threaded schedule to an Application Specific Circuit (ASC). 

1 15. The method according to claim 14, further including synthesizing the hardware 

2 schedule to maximize throughput. 

1 1 6. The method according to claim 14, further including synthesizing the hardware 

2 threaded schedule to reduce power consumption. 
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1 17. The method according to claim 13, further including receiving resource constraint 

2 information for the processing elements. 



1 1 8. A hardware threaded circuit system, comprising: 

2 a memory; 

3 a task manager coupled to the memory; and 

4 a plurality of processing elements coupled to the task manager, wherein first and 

5 second ones of the plurality of processing elements are interconnected for hardware 

6 threaded processing to enable dynamic borrowing of processing resources associated with 

7 the second one of the plurality of processing elements by the first one of the plurality of 

8 processing elements. 



1 19. The system according to claim 1 8, wherein the circuit maximizes throughput. 

1 20. The system according to claim 1 8, wherein the circuit reduces power consumption 

2 compared to a non-threaded processing for substantially similar system wait times. 

1 21. The system according to claim 1 8, wherein the first and second processing elements 

2 each include a first type of resource and a second type of resource and a multiplexer such 

3 that the interconnection includes at least one input signal being provided to the first type of 

4 resource in the first and second processing elements. 

1 22. The system according to claim 21, wherein the interconnection includes a connection 

2 from an output of the second processing element first type of resource to the first 

3 processing element. 
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